4 research outputs found
Mega-Reward: Achieving Human-Level Play without Extrinsic Rewards
Intrinsic rewards were introduced to simulate how human intelligence works;
they are usually evaluated by intrinsically-motivated play, i.e., playing games
without extrinsic rewards but evaluated with extrinsic rewards. However, none
of the existing intrinsic reward approaches can achieve human-level performance
under this very challenging setting of intrinsically-motivated play. In this
work, we propose a novel megalomania-driven intrinsic reward (called
mega-reward), which, to our knowledge, is the first approach that achieves
human-level performance in intrinsically-motivated play. Intuitively,
mega-reward comes from the observation that infants' intelligence develops when
they try to gain more control on entities in an environment; therefore,
mega-reward aims to maximize the control capabilities of agents on given
entities in a given environment. To formalize mega-reward, a relational
transition model is proposed to bridge the gaps between direct and latent
control. Experimental studies show that mega-reward (i) can greatly outperform
all state-of-the-art intrinsic reward approaches, (ii) generally achieves the
same level of performance as Ex-PPO and professional human-level scores, and
(iii) has also a superior performance when it is incorporated with extrinsic
rewards
Arena: A General Evaluation Platform and Building Toolkit for Multi-Agent Intelligence
Learning agents that are not only capable of taking tests, but also
innovating is becoming a hot topic in AI. One of the most promising paths
towards this vision is multi-agent learning, where agents act as the
environment for each other, and improving each agent means proposing new
problems for others. However, existing evaluation platforms are either not
compatible with multi-agent settings, or limited to a specific game. That is,
there is not yet a general evaluation platform for research on multi-agent
intelligence. To this end, we introduce Arena, a general evaluation platform
for multi-agent intelligence with 35 games of diverse logics and
representations. Furthermore, multi-agent intelligence is still at the stage
where many problems remain unexplored. Therefore, we provide a building toolkit
for researchers to easily invent and build novel multi-agent problems from the
provided game set based on a GUI-configurable social tree and five basic
multi-agent reward schemes. Finally, we provide Python implementations of five
state-of-the-art deep multi-agent reinforcement learning baselines. Along with
the baseline implementations, we release a set of 100 best agents/teams that we
can train with different training schemes for each game, as the base for
evaluating agents with population performance. As such, the research community
can perform comparisons under a stable and uniform standard. All the
implementations and accompanied tutorials have been open-sourced for the
community at https://sites.google.com/view/arena-unity/
Packet dropping characteristics in a queue with autocorrelated arrivals
This paper provides a detailed description of the packet dropping process connected with the buffer overflows in a network node. Namely, we show the formulas for the most important loss characteristics, both in the transient and the stationary regime and then illustrate them via numericalexamples. In order to make it possible to obtain the droppingcharacteristics for strongly autocorrelated arrivals, the Markovmodulated Poisson process is used as a traffic model